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Abstract 

The use of parity-check gates in information theory has proved to be very efficient. In particular, 
error correcting codes based on parity checks over low-density graphs show excellent performances. 
Another basic issue of information theory, namely data compression, can be addressed in a similar 
way by a kind of dual approach. The theoretical performance of such a Parity Source Coder can 
attain the optimal limit predicted by the general rate-distortion theory. However, in order to 
turn this approach into an efficient compression code (with fast encoding/decoding algorithms) 
one must depart from parity checks and use some general random gates. By taking advantage 
of analytical approaches from the statistical physics of disordered systems and SP-like message 
passing algorithms, we construct a compressor based on low-density non-linear gates with a very 
good theoretical and practical performance. 
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I. INTRODUCTION 



Information theory and statistical physics have common roots. In the recent years, this 
interconnection has found further confirmation in the analysis of modern error correcting 
codes, known as Low Density Parity Check (LDPC) codes , by statistical physics 

methods 0, O, 0] . In such codes, the choice of the encoding schemes maps into a graphical 
model: the way LDPC codes exploit redundancy is by adding bits which are bound to satisfy 
some random sparse linear set of equations - the so called parity checks. Such equations are 
used in the decoding phase to reconstruct the original codeword (and hence the message) 
from the corrupted signal. The parity checks can be represented by a graph indicating which 
variables participate in each check. The randomness and the sparsity of the parity checks 
is reflected into the characteristics of the associated graphs, a fact that makes mean-field 
type statistical physics methods directly applicable. Thanks to the duality between channel 
coding and source coding j3|[, similar constructions can be used to perform both lossless and 
lossy data compression (ll, |q| • It must be mentioned that while there exist some practical 
algorithms for the lossless source coding which are very efficient P], much less work has been 
done so far for the lossy case Q|. In this work we present a new coding technique which 
consists in a generalization of parity-check codes for lossy data compression to non-linear 



codes 



This issue has been addressed recently using methods from the statistical physics 



of disordered systems: a non-linear perceptron [1^ or a satisfiability problem have been 
used to devel op pra ctical source coding schemes. Other recent advances in the field can be 
found in QlllQ- 

We consider the following problem. We have a random string of uncorrelated bits 
xi, . . . xm (prob(xa = 0) = prob(a;a = 1) = 1/2) and we want to compress it to the shorter 
coded sequence a"i, . . .aj^ (encoding). The rate R of the process is i? = N/M. Once we 
recover the message (decoding), we may have done some errors and thus we are left in prin- 
ciple with a different sequence xl, . . -x^j. The number of different bits normalized by the 
length of the string is the distortion, 

1 ^ 

a=l 

The Shannon theorem states that the minimum rate at which we can achieve a given dis- 
tortion D is Rmin = 1 — H2{D), where H2{D) is the binary entropy of the probability 



distribution used to generate the Xq's. This Rmin is the value one has to compare to in order 
to measure the performance of a compression algorithm. This problem is a simple version 
of the lossy data compression problem. In real applications one is often more interested in 
compression through quantization of a signal with letters in a larger alphabet (or continu- 
ous signal Here we shall keep to the memoryless case of uncorrelated unbiased binary 
sources, and we hope to be able to generalize the present approach to more realistic cases 
in the near future. 



A. Constraint Satisfaction Problems 



The underlining structure of the protocol that we are going to introduce is provided by 
a constraint satisfaction problem (CSP). A CSP with N variables is defined in general by a 
cost function 

M 

E[a] = Y,ea{W^ea}) , (2) 

a=l 

where the function node Eaici, . . -(TKa) involves a subset of Ka variables randomly chosen 
among the whole set. The problem is fully defined by choosing some Ea and an instance 
is specified by the actual variables involved in each constraint. A useful representation for 
a CSP is the factor graph (Fig. Q left). The two kinds of nodes in this graph are the 
variables and the constraints. For the sake of simplicity we shall take the functions Ea to 
have values in {0, 2} (the clause is resp. SAT or UNSAT) and the variables CTj will be Ising 
spins (cTj = ±1). A somewhat special case of CSP which has been studied extensively occurs 
when Ka = K for any a, and we shall restrict ourselves to this case in what follows. One 
can then show that the degree of a given variable is Poisson distributed with mean value 
KM/N and that the typical length of a loop is 0{\ogN / \og{KM /N)). We are interested 
in the limit M, oo, where a = M/N is fixed and plays the role of a control parameter. 
Once a problem is given in terms of its graph representation, an instance corresponds to a 
given graph chosen among all the legal ones with uniform probability. We will go back to 
this problem in section IHI We first show how a CSP can be used as a tool for compressing 
data. 
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FIG. 1: Left: Factor graph for a constraint satisfaction problem with N = 7 variables (circles) and 
M = 5 constraints (squares). In this example the connectivity of the function nodes is kept fixed, 
that is Ka = i^' in the notations of the text. Right: Message passing procedure {K = 6): The 
probability qa^o{ua^o) depends on all the probabilities qh-^i{ui,^^i) , i = 1, . . .5 (see text). 

B. Encoding 

We use the initial word Xi,X2, ■ ■ ■ xm to construct a set of M constraints between 
boolean variables Ui, cr2, . . . a^. The factor graph is supposed to be given. Each constraint 
a involves Ka variables and is defined by two complementary subsets of configurations Sq 
and Si- The value of Xa controls the role of the subsets as follows: If = then all the 
configurations {(Xi . . . dKa} G Sq satisfy the constraint, i.e. Sa = 0; all the configurations 
{(Ti . . . cTA'a} G Si do not satisfy the constraint and thus Sa = 2. If Xq = 1 the role of the two 
subsets is exchanged. This defines the CSP completely and we can look for a configuration 
of (Tj that minimizes the number of violated constraints. This is the encoded word and the 
rate of the process is -R = 

C. Decoding 

Given the configuration ai, (72, . . . (Ta?, we compute for each node a whether the variables 
(jjo, . . . (jjo are in Sq (in this case we set x* = 0) or in Si (leading to x* = 1). Since a cost 
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FIG. 2: The ground state energy density for the XORSAT problem at a = 2.0 versus K. We also 
show the corresponding Shannon bound as computed from rate-distortion theory, Esh = 'iaDsh, 
where Dsh is such that 1 — H2{Dsh) = 1/a. 



e = 2 is paid for each UNSAT clause, the energy is related to the distortion by 
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(3) 



D. The Parity Source Coder 



In order to give an example, we apply these ideas to the case where the CSP is a parity 
check (the optimization problem is then called XORSAT). In other words, the cost of the 
constraint a is written as 

£, = l + (-irJ]a, . (4) 



This Parity Source Coder (PSC) is the counterpart of the LDPC codes in error-correcting 
codes and then it seems natural to expect a good performance from it. 

In Fig. 121 we show that the ground state energy of the XORSAT problem, that is the 
theoretical capacity of the PSC, quickly approaches the Shannon bound as K increases jlS^ . 
A PSC with checks of degree K > 6 has then a theoretical capacity close to the optimal 
one. The problem here is that there does not exist any fast algorithm that can encode a 



string (for example, the SID algorithm that we discuss in the next section is known not to 
converge). 

We will thus investigate a new kind of constraint satisfaction problem, using non-linear 
nodes, with (almost) the same good theoretical performance as the XORSAT problem, but 
with a fast encoding algorithm. Before we do this, we introduce in the next section the 
general formalism used in the study of non-linear nodes. 

II. THE GENERAL FORMALISM 

Given a CSP at some a, we are interested in knowing whether a typical instance of the 
problem is satisfiable (i.e. all the constraints can be satisfied by one global configuration) as 
the number of variables goes to infinity. Generally speaking, there will be a phase transition 
between a SAT regime (at low a the problem has a small number of constraints and thus is 
solvable - at least in principle ) and an UNSAT regime where there are too many constraints 
and one cannot find a zero-energy configuration. In this UNSAT regime, one wants to 
minimize the number of violated constraints. This kind of problem can be approached by 
message passing algorithms. 

Due to the large length of typical loops, the local structure of a typical graph is equivalent 
to a tree, and this is crucial in what follows. We introduce the cavity bias Ua-,i G { — 1, 0, +1} 
sent from a node a to a variable i. A non-zero message means that the variable i is requested 
to assume the actual value of Ua^i in order to satisfy the clause a. If Ua^i = the variable i is 
free to assume any value. It is clear that this message sent to i should encode the information 
that a receives from all the other variables attached to it. In order to clarify this point, we 
refer to Fig. ^ (right), that is we focus on a small portion of the graph. As the graph is locally 
tree-like, the variables ai, i = 1,2, K — 1, are only connected through clause a if is large 
enough. If a is absent, the total energy of the system can be written as {ai, ...(Jk-i) = 
A — ^fSi' hiai. We have used here the assumption that the probability of these ai factorizes. 
This is again motivated by the local tree structure, but we shall go back to this point. 
After clause a is added, E^^^{aQ,ai, ...aK-i) = {ai, ...ax-i) +£^a(c"0) o"!; ■■■O'k-i) and the 
variables rearrange in order to minimize the total cost. The minimization then defines ^(ao) 
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from 

K-l 



^ - \hi\ + e{ao) = min E^+^(cro, cti, ...cta'-i) • (5) 



cri,...(TX-l 
1=1 



This e(o"o) is then the cost to be paid for adding one variable with a fixed value ctq. Without 
losing generality, it can be written as 

e(ao) = Aa->0 - OQUa-^Q , (6) 

where Wa^o is the cavity bias acting on the new variable and Aa_>o is related to the actual 
energy shift by 

AE = min [^"^+'(^0, ai, . . . cjk^x) - E^{(Tu . . . ax-i)] = A„^o - \ua^o\ ■ (7) 

ao,...(JK-l 

Given that Ea can be or 2, depending on the set of fields hi, . . . hx-i one has four possi- 
bilities: 

£(+1) = and £(-1) = 0^n = 0,A = (8) 

£(+!) = and = 2 ^ n = +1 , A = 1 (9) 

e{+l) = 2 and e{-l) = n = -l,A = l (10) 

£(+1) = 2 and ^(-l) = 2 ^ u = , A = 2. (11) 

In other words, a non-zero message u is sent from clause a to variable cq only if the satisfia- 
bility of clause a depends on (Tq. A null message {u = 0) can occur in the two distinct cases 
© and (HH). 

The main hypothesis we have done so far consisted in assuming that the two variables 
are uncorrelated if they are distant (the energy is linear in the cTj's). It turns out that, in 
a large region of the parameter space |2^, including the regime we are interested in, 
this is not correct. This is due to the fact that the space of solutions breaks into many 
disconnected components if a is greater than a critical value. In order to deal with this case, 
one has to introduce, for each directed link, a probability distribution of the cavity biases, 
namely q{u) = ri~^Su,+i + f]~^u~i + (1 — 77"*" —f]~)^u,o- The hypothesis of no correlation holds 
if the phase space is restricted to one component. The interpretation of qa^i{ua^i) is the 
probability that a cavity bias Ua-^i is sent from clause a to variable i when one component 
is picked at random . According to the rules ()8p9pi()fll|l , and with the topology of Fig. [T] 
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(right) as a reference for notations, the Survey Propagation (SP, 22]) equations are then 



Vl-^o oc Prob 
Va^o oc Prob 

7a^0 



° OC Prob 



(^6-.ii)bgj,\a 5 • • • {'"'b~^iK-i)beiK-.i\a 



(£,( + 1) < iai-l)) 
(Sai + l) > iai-l)) 
(Sai + l) = Sai + l)) 



-yAE 



-yAE 



-yAE 



, (12) 



(13) 



(14) 



where the energy shift AE is given in eq. (|7j) and is non-zero only when the constraint is 
UNSAT for any value of ctq (cf. eq. (jllj) ). The crucial reweighting term exp{—yAE) thus 
acts as a "penalty" factor each time a clause can not be satisfied. This term is necessary in 
the UNSAT regime which we explore here (while simpler equations with y = oo are enough 
to study the SAT phase). 



A. Encoding: SP and decimation 

For each fixed value of y, the iterative solution for the SP equations can be implemented 



on a single sample [2^, i.e. on a given graph where we know all the function nodes involved. 
The cavity probability distributions q(ti)'s are updated by picking up one edge at random 
and using (jHEEIEl)- This procedure is iterated until convergence. This yields a set of 
messages {?7~_^j, Va^v Va^i} ^ach edge of the factor graph which is the solution of the SP 
equations. This solution provides very useful information about the single instance that can 
be used for decimation. As explained in , the finite value of y used for this purpose must 
be properly chosen. Given the solution for this y, one can compute the distribution P{Hi) of 
the total bias Hi = Xlaei "^^^j '^^ each variable. One can then fix the most biased variables, 
i.e. the one with the largest \P{Hi = +1) — P{Hi = — 1)|, to the value suggested by the 
P{Hi) itself. This leads to a reduced problem with — 1 variables. After solving again the 
SP equations for the reduced problem, the new most-biased variable is fixed and one goes 
on until the problem is reduced to an "easy" instance. This can be finally solved by some 
conventional heuristic (e.q. walksat or simulated annealing). A significant improvement of 

nn 

the decimation performance can be obtained by using a backtracking procedure [2^ |2J| : At 
each step we also rank the fixed variables with a strongly opposed bias and unfix the most 
"unstable" variable with finite probability. The algorithm described here is called Survey 
Inspired Decimation (SID) and its peculiar versions have been shown to be very useful in 
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many CSP problems recently. Unfortunately, the basic version of SID does not work for the 
XORSAT problem because of the symmetric character of the function nodes that is reflected 
in a large number of unbiased variables (some improvements appear to be possible 25|). 



B. Statistical analysis: Theoretical performance 

One can perform a statistical analysis of the solutions of the SP equations by population 



dynamics 21[. The knowledge of the function node allows to build up a table of values of 
u for each configuration of the local fields hi, ... , h^-i (this is done according to the mini- 
mization procedure described above). We then start with some initial (random) population 
of ffi = {rj^, rj^, ?7~}, for i = 1, . . . N. We extract a Poisson number p of neighbors and p prob- 
abilities fji-^, . . .fji^. According to these weights, p biases Ui,...Up are generated and their 
sum computed, hi = ^^=1 "^a- O^ice we have — 1 of these fields we perform the minimiza- 
tion in (0) and compute the new probability for Ua according to the rules (j8l9llUllip . The 
whole process is then iterated until a stationary distribution of the cavity biases is reached. 
This method for solving the SP equations is very flexible with respect to the change of the 
choice of the node, because this choice just enters in the calculation once at the beginning 
of the algorithm in order to initialize some tables. One can also study problems with many 
different types of nodes in a given problem: in this case, one among them is randomly chosen 
each time the updating is performed. We finally stress that once the probability distribu- 
tion of the cavity biases is known the ground state energy of the problem can be computed 



2l|. The expression for the energy in term of the 



according to the formalism introduced in 
probability distributions as 

E{a) = max $(«,?/) , (15) 

y 

Ha,y) = + l)a]\ogA,{y) - {K - l)alog Ap+i(y)} , (16) 

(p ^ \ 

-y^l'^o.l J , (17) 
a=l a=l / 

the overline standing for an average over the Poisson distribution and the choice of the 
distributions qi,...qp in the population. Recalling that R = l/a, the average distortion 
(i.e., the theoretical capacity) of a compressor based on this CSP is computed through 
equations Q and (fT^. 
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FIG. 3: Top: The free energy $(a,y) of a number of symmetric nodes with X = 7 at a = 1.4, 
classified according to the rule given in the text. Bottom: The asymmetry \rij^ — ?7_ | of these nodes 
(left) is measured as well as the "paramagnetic degree" r^o = 1 ~ ^- ~ ^+ (right). Note that the 
usual 7-XOR node corresponds to the {2, 0, 2, 0} case in this notation. 

Let us study here as an example a family of function nodes whose energy is fully invariant 
under permutations of the arguments. These nodes can be classified according to the energy 
of the node for a given value of the "magnetization" m = Xlili '^i- keep to odd K and 
label a particular node by the sequence of values {e{m = K),e{m = K — 2), . . . , e{m = 1)}. 
In Fig.Elwe report the results of the population dynamics algorithm for some types of nodes 
at K = 7. According to eq. (fT3|) . the ground state energy E{a) corresponds to the maximum 
over y of the free energy $(«,?/) represented in the top plot. The theoretical performance of 
these nodes are quite close to the PSC case (the XOR node, characterized by the sequence 
{2,0,2,0}). 

III. NON-LINEAR NODES 

We consider in this section the function nodes that give the best performance for com- 
pression, both form the theoretical point of view (theoretical capacity close the parity-check 
nodes) and from the algorithmic aspect (the SID algorithm at finite y is found to converge in 



10 



0.25 




0.00 ^ — ' ' ' ' ' ' ' ' ' ' 

1 1.2 1.4 1.6 1.8 2 

a = l/rate 

FIG. 4: The ground state energy for a system with 30 non-Unear nodes. We also plot the Shannon 
bound and the performance of the SID algorithm for the A' = 6 case. 

the UNSAT regime, thus giving an explicit encoding algorithm). These non-linear function 
nodes are defined as follows. We recall that the output Exor of a i^-XORSAT node is given 
by twice the sum modulo 2 of the input bits. We label each configuration a G {0, 1}^ with 
an integer / G [1,2^] and consider a random permutation tt of the vector {1, 2, . . . , 2^}. 
Then, we can associate the output of the random node by letting e'^'^^{V) = Exoni'^iO)- 
this way we are left with a random but balanced output which can be different from XOR. 
Also, it is clear that they are not more defined by a linear formula over the boolean variables. 

We can take advantage of the formalism introduced in the previous section in order 
to study the theoretical performance of these new function nodes. In particular, we have 
used 10 to 30 different random nodes to build the factor graphs. In order to improve the 
performance, we have also forbidden "fully-canalizing" nodes, that is nodes whose SAT 
character depends on just one variable. 

In Fig. m we show our results. The ground state energy is shown to quickly approach the 
Shannon bound as K increases and for any a. As an example, at a = 2 (corresponding to 
a compression rate R = 1/2) the difference between the K = 8 value and the theoretical 
limit is ^ 2%. This looks very promising from the point of view of data compression. 
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Furthermore, the results obtained from the SID (same plot) show that in this case the 
algorithm does converge and its performance is very good. It should be noted that when 
K becomes large the difference between SP and the Belief Propagation (BP) becomes small 
(this can be seen for instance in the analysis of Q]), so in fact BP does also provide a good 
encoding algorithm for K > 6. At fixed the time needed to solve the SP equations is 
0{N\ogN). The actual computational time required by the decimation process is of the 
same order and it slightly depends on the details of the SID algorithm {e.g. the number 
of variables fixed at each iteration, whether or not a backtracking procedure is used, which 
kind of heuristic is adopted to solve an "easy" reduced instance, the proper definition of 
the latter, etc.). The dependence on K is exponential. Thus, even if increasing K is good 
from the theoretical point of view, it turns out to be very difficult to work at high K. In 
practice, it takes a few hours to compress a string of = 1000 bits at A' = 6 by using our 
general purpose software. We think that some more specified code would lead to a better 
performance. 

To conclude, we have shown how the methods of statistical mechanics, properly adapted 
to deal with a new class of constraint satisfaction problems, allow to implement a new 
protocol for data compression. The new tool introduced here, the non-linear gates, looks 
very promising for other practical applications in information theory. 
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